Survival prediction landscape: an in-depth systematic literature review on activities, methods, tools, diseases, and databases

Survival prediction integrates patient-specific molecular information and clinical signatures to forecast the anticipated time of an event, such as recurrence, death, or disease progression. Survival prediction proves valuable in guiding treatment decisions, optimizing resource allocation, and interventions of precision medicine. The wide range of diseases, the existence of various variants within the same disease, and the reliance on available data necessitate disease-specific computational survival predictors. The widespread adoption of artificial intelligence (AI) methods in crafting survival predictors has undoubtedly revolutionized this field. However, the ever-increasing demand for more sophisticated and effective prediction models necessitates the continued creation of innovative advancements. To catalyze these advancements, it is crucial to bring existing survival predictors knowledge and insights into a centralized platform. The paper in hand thoroughly examines 23 existing review studies and provides a concise overview of their scope and limitations. Focusing on a comprehensive set of 90 most recent survival predictors across 44 diverse diseases, it delves into insights of diverse types of methods that are used in the development of disease-specific predictors. This exhaustive analysis encompasses the utilized data modalities along with a detailed analysis of subsets of clinical features, feature engineering methods, and the specific statistical, machine or deep learning approaches that have been employed. It also provides insights about survival prediction data sources, open-source predictors, and survival prediction frameworks.


Introduction
According to World Health Organization (WHO), around ten thousand diseases have been discovered and each disease has unique characteristics, symptoms, and implications on human health (Haendel et al., 2020).Millions of people died from such diseases in the span of years 2000 to 2019, while cancers, cardiovascular, and infectious diseases persisted as the leading causes of mortality (Jamison, 2018;World Health Organization, 2020).Extensive research on the intersection of life and technology has yielded a wide range of therapies and medications for various well-known diseases [National Research Council (US), 2010].However, the core idea behind traditional therapies and medications is based on the "one-size-fits-all" (Sellin, 2015).In this paradigm, a single drug is supposed to effectively treat a medical condition across a variety of patient cohorts i.e., children, old and young populations (Al-Lazikani et al., 2012;Sellin, 2015).Indepth exploration and understanding of living organisms' inherent biological processes reveal that high variability in genetics and drug responses make one-size-fits-all medication ineffective (Al-Lazikani et al., 2012;Sellin, 2015).
The groundbreaking discoveries of the factors contributing to the limited effectiveness of generalized medications marked the inception of the era of precision medicine (Ashley, 2016;Kosorok and Laber, 2019).Precision medicine offers customization in tailored medical treatments based on an individual's unique genetic makeup, and optimization in drug selection and dosage based on the individual's lifestyle, and environmental factors (Farrokhi et al., 2023).Precision medicine's adoption and effectiveness have been significantly enhanced by the accurate, cost-effective, and large-scale analysis of molecular information obtained through next-generation sequencing (Kamps et al., 2017).
In the realm of precision medicine, survival prediction plays a pivotal role in tailoring medical treatments to individual needs (Billheimer et al., 2014;Tsimberidou et al., 2019).Survival prediction categorizes patients into distinct risk groups that enhance the efficiency of resource allocation for the patients who are likely to gain the most benefit from specific treatments (Billheimer et al., 2014;Tsimberidou et al., 2019).It also enables counseling of patients and their families by predicting the expected course of the disease and potential challenges (Billheimer et al., 2014).In addition to medical treatments, survival prediction offers multiple advantages in research, particularly in the area of biomarker discovery and disease understanding (Chen et al., 2018;Sarma et al., 2020).Survival prediction models provide useful information about the correlation between different features and clinical outcomes.This correlation information enables the identification of novel biomarkers associated with disease prognosis (Sarma et al., 2020).Moreover, researchers leverage survival prediction to unravel disease heterogeneity which helps to identify distinct subtypes with different survival profiles (Hao et al., 2023).This knowledge not only aids in the stratification of homogeneous patients in clinical trials but also validates therapeutic targets by assessing their relevance in predicting patient outcomes (Glare et al., 2003).Furthermore, it enables the longitudinal monitoring of disease progression that helps to explore critical time points and progression patterns (Carobbio et al., 2020).
To expedite advancements in survival prediction research, researchers are harnessing the capabilities of AI algorithms by utilizing extensive survival-related data from public databases such as the Cancer Genome Atlas Program (TCGA) (Tomczak et al., 2015), and NCI Genomic Data Commons (GDC) (Jensen et al., 2017;Shen et al., 2019;Malik et al., 2021;Mirbabaie et al., 2021;Arjmand et al., 2022;Fan et al., 2023;Pellegrini, 2023).In addition, the diversity and heterogeneity of diseases hinder the development of a universally applicable survival prediction pipeline (Kourou et al., 2015;Hao et al., 2023).
Driven by the necessity for disease-specific predictors, there is a concerted effort to develop more accurate and powerful predictive tools (Baek and Lee, 2020;Jiang et al., 2020;Benkirane et al., 2023).Figure 1 illustrates that for the advancement of survival predictors, public databases provide a spectrum of clinical data (Jung et al., 2023;Qian et al., 2023) and encompass nine diverse omics data modalities, including gene expression (mRNA), micro RNA (miRNA), DNA methylation, copy number variation (CNV), long non-coding RNA (lncRNA), proteomics, metabolic, whole exome sequencing (WES) and mutation (Baek and Lee, 2020;Malik et al., 2021;Han et al., 2022;Jiang et al., 2022).In each data modality, there exists an array of missing values that hinder survival predictors learning.Extensive research is being conducted to impute missing values by using different techniques such as deletion, multiple, K-nearest neighbor (KNN), and median imputation (Van Buuren et al., 1999;García-Laencina et al., 2015;Chai et al., 2021b).In addition, various normalization methods are also being used to normalize feature space such as quantile (Zhao et al., 2020), variance threshold (Bolstad et al., 2003), and rank normalizations (Ni and Qin, 2021).
In the development of survival prediction pipelines, researchers are trying to unlock the potential of various data modalities by assessing predictor performance with individual modalities and combinations of multiple data modalities across diverse types of diseases (Lee et al., 2020;Hao et al., 2023;Pellegrini, 2023).When data from different modalities is combined, survival predictors' input feature space becomes very large which impedes the performance of AI approaches (Feldner-Busztin et al., 2023).Researchers are trying to explore feature engineering approaches such as random forest importance (RFI), and recursive feature elimination (RFI) (Wang et al., 2022), principal component analysis (PCA) (Lv et al., 2020;Jiang et al., 2022), non-negative matrix factorization (NMF) (Tang et al., 2021), and autoencoders (AEs) (Li et al., 2020;Wang et al., 2020;Owens et al., 2021).Moreover, in an end-to-end survival predictive pipeline, apart from the selection of appropriate data and feature engineering strategy, designing appropriate survival prediction models is also an active area of research (Deepa and Gunavathi, 2022).
Under different aforementioned directions, the recent 3 years have witnessed around 74 different survival predictors for different diseases.To further accelerate and expedite the development of more powerful predictors, in the last 10 years, from time to time, researchers have published 22 different review articles.These articles primarily aim to summarize the latest trends and developments in data modalities, feature engineering methods, and AI models specifically related to survival prediction.However, the focus of these reviews is often constrained to either a singular disease or multiple subtypes of cancer, highlighting a limited scope within the broader landscape of survival prediction research (Herrmann et al., 2021;Pobar et al., 2021;Boshier et al., 2022;Deepa and Gunavathi, 2022;Feldner-Busztin et al., 2023;Rahimi et al., 2023).More comprehensive details about the scope of existing review articles in terms of contributions and drawbacks are summarized in Table 2 and Section 3. Following the need for a comprehensive review article for survival prediction, the contributions of this paper are manifold: • It consolidates a diverse array of 22 survival prediction review papers, bringing together their scopes and limitations under a unified umbrella.This compilation serves as a valuable resource for researchers seeking high-level insights and pertinent information in the field.

FIGURE
An end-to-end survival prediction pipeline.
• It provides comprehensive insights into 74 survival prediction articles published between 2020 and 2023.
The objective is to delve into diverse aspects of the field, extract and furnish useful information from these articles under the following different research questions (RQs) and objectives: (i)

Background
Survival prediction makes use of patient-specific molecular information and clinical signatures to forecast a wide range of events at particular time intervals (Pellegrini, 2023).The most common events include recurrence, metastasis, response, hospitalization, recovery, and progression of a disease.Some of these events represent similar contexts, i.e., metastasis and progression, both contribute to the overall progression of the condition/cancer (Murthy et al., 2004).Survival prediction events are generally categorized into 4 different survival endpoints namely, overall survival (OS) (Driscoll and Rixe, 2009), diseasefree survival (DFS) (Sargent et al., 2005), progression-free survival (PFS) (Gyawali et al., 2022), and biochemical recurrence (BC) (Boorjian et al., 2011).Survival endpoints serve as crucial measures for assessing the outcomes of interventions, indicating the duration until specific events occur.Therefore, events are essentially the occurrences that contribute to the survival endpoints (Fiteni et al., 2014).These endpoints are critical to examine the trajectory of a particular disease (Fiteni et al., 2014;Gyawali et al., 2022).These survival endpoints are clearly defined in Table 1.
Survival prediction is time to event approach with two distinct aspects, i.e., survival and hazard function (Kleinbaum and Klein, 1996).Survival function describes the probability that a subject survives longer than some specified time t.Mathematically, it is expressed as: where T is the random variable for survival time, t is a specific value of interest for T. For instance, S(10) represents the probability of survival beyond 10 years without experiencing a specific event.
As time passes, S(t) decreases, reflecting the reduction in the probability of surviving without the occurrence of event E up to time t.
In comparison, the hazard function illustrates the probability of an event E occurring at a specific time interval ( t) with a prior assumption that the event has not taken place.The probability that the event E occurs within a very small time interval t around time t is given by the conditional probability: Dividing this probability by the length of the time interval ( t) gives the rate of occurrence of the event at time t.The limit as the time interval ( t) approaches zero gives the instantaneous rate of occurrence at time t.Mathematically, this is represented as: where f (t) represents the probability density function of survival time.Thus, survival function S(t) shows that the subject survives beyond a specific time point and hazard function h(t) complements this by providing a risk rate that a patient does not survive in a specific time interval conditioned on having survived thus This review paper focuses solely on current trends in survival prediction, omitting basic terminologies and mathematical formulations.For a concise mathematical overview, readers are advised to consult earlier review papers (Lee and Lim, 2019;Kvamme and Borgan, 2021).
far.Moreover, S(t) is always monotonic in nature, however h(t) is classically assumed to follow increasing Weibull, decreasing Weibull, or lognormal survival curves (Kleinbaum and Klein, 1996;Murthy et al., 2004).

A look-back into existing review studies
In recent years multiple review papers have been published and the objective of each review revolves around summarizing fundamental concepts in survival prediction and identifying trends in statistical, ML, and DL algorithms that have been utilized in the development of survival predictors.Table 2 illustrates a high-level overview of the existing 22 review articles in terms of their review scope and limitations.This comprehensive summary aims to assist researchers in locating specific information within relevant articles more effectively.
On the other hand, in the realm of disease specific survival predictors scope of existing review papers is limited.For instance, eight papers only summarize survival predictors on single disease or subtype of cancer, i.e., cervical cancer (Rahimi et al., 2023), glioblastoma (Tewarie et al., 2021), esophageal adenocarcinoma (Boshier et al., 2022), esophageal and gastroesophageal junction cancer (Gupta et al., 2018), head and squamous cell carcinoma (Mo et al., 2022), palliative cancer patients (Pobar et al., 2021), cardiovascular diseases (CVD) (Westerlund et al., 2021;Kresoja et al., 2023), and schizophrenia (Guan et al., 2022).Although four papers cover multiple subtypes of cancer but they cover only handful of eight different subtypes such as, breast, lung, gastric, colon, esophageal, ovarian cancers and so on.
While the scope of survival prediction extends beyond multiple diseases, existing review papers fall short to summarize current trends of data modalities, feature engineering approaches and survival prediction models.For example, Deepa and Gunavathi (2022) specifically address the primary categories of data modalities used for survival prediction, namely multiomics and clinical data.However, the review does not extensively explore trends and patterns related to the nine different omics types i.e., gene expression (mRNA), micro RNA (miRNA), methylation, copy number variation (CNV), whole exome sequencing (WES), long noncoding RNA (lncRNA), mutation, metabolic, and proteomics, or clinical features associated with distinct cancer subtypes.Similarly, Westerlund et al. (2021) do not explore the potential of multiomics data in terms of cardiovascular diseases.In addition, various review papers completely neglect to address feature engineering in survival prediction (Ahmed, 2005;Bashiri et al., 2017;Gupta et al., 2018;Pobar et al., 2021;Kantidakis et al., 2022;Rahimi et al., 2023).For instance, Feldner-Busztin et al. (2023) despite their focus on dimensionality reduction, fall short in providing a comprehensive summary of current trends in feature engineering approaches with respect to diseases and data modalities.Furthermore, a small portion of these review papers cover details of few state of the art survival prediction models (Ahmed, 2005;Kantidakis et al., 2022;Wiegrebe et al., 2023).While current review papers summarize survival prediction pipelines partially, there is a necessity to bring diverse information into a unified platform which offers comprehensive insights into patterns and trends associated with survival prediction pipelines.

Methodology
This section explains different steps or stages of preferred reporting items for systematic review and meta-analyses (PRISMA) strategy (Moher et al., 2010), which is used to gather relevant papers on survival analysis.Figure 2 provides a visual representation of various stages form PRISMA that are summarized in the following subsections.

. Search strategy
In Figure 2, the identification stage illustrates combinations of different keywords that are used to search research articles.The keywords block has two different types of operators "∧" and "∨" operators.On the basis of these operators one keyword from each block is selected and various search queries are formulated such as,

"SURVIVAL PREDICTION AND AI AND OMICS", "SURVIVAL PREDICTION AND AI AND Multiomics", "SURVIVAL Machine
Learning AND OMICS", and so on.These queries are utilized in literature search engines like lens (https://www.lens.org/),and Google Scholar for literature search from Jan 2020 to Jul 2023.

. Eligibility and screening strategy
With an aim to retain literature related to survival prediction, two different screenings are performed on the basis of the following criteria; • Articles that use only image-based datasets for survival prediction.
• Articles that do not make use of ML, DL, or statistical methods for survival prediction.
• Articles with closed access.
Initially, guided by the title and abstract of the articles, more than 800 studies are discarded.Subsequently, at the final step, based on a comprehensive review of the full text a second screening is performed, resulting in the exclusion of an additional 20 studies.Ultimately, 90 papers are selected for the final comparison and discussion of survival prediction.

Results
. RQ I, II, III: survival predictors distribution analysis across diseases and survival endpoints The primary aim of this section is to summarize the distribution of survival predictors across various diseases and survival endpoints.Predictors distribution analysis under individual diseases offers insights into the most active trends of predictors associated with specific diseases.This consolidated distribution provides a centralized platform to access valuable information about their disease of interest.Similarly, examining the distribution of articles across survival endpoints is valuable for identifying current trends in forecasting multiple events.This approach not only enhances our understanding of the current state of predictive modeling but also facilitates researchers in efficiently accessing information specific to their desired endpoints.Through this exploration, we aim to contribute to a deeper understanding of the diverse landscape of survival prediction research and its applications across various diseases and endpoints.
To date, approximately more than 100 different cancer subtypes have been identified (Grever et al., 1992).However, a deeper analysis of the last 3 years reveals that survival prediction models have been developed for only 40 distinct cancer subtypes, as outlined in Table 3.Among 36 different subtypes, most of the predictors have been designed for breast cancer, lung adenocarcinoma, ovarian cancer, and glioblastoma.On the other hand, seven different predictors have been designed for pancancer.Notably, there is a difference between other cancer types and pan-cancer because under this paradigm predictors simultaneously deal with multiple cancer subtypes.Pan-cancer based survival prediction entails predicting patient survival outcomes using data and models applicable to various cancer types (Fan et al., 2023).Instead of focusing on just one type of cancer, this approach draws on data from multiple cancers to identify shared patterns and markers that influence survival.By combining a diverse array of genetic, molecular, and clinical features that are common across different cancers, this method aims to improve the accuracy of survival predictions (Wu et al., 2023).For the development of pan-cancer based predictors, there exists public data having more than 30 distinct cancer subtypes (Liu et al., 2018).However, researchers are utilizing different subsets for the development of predictors (Fan et al., 2023).Figure 3 provides an overview of multiple survival prediction studies that encompass a range of cancer subtypes, either within a pan-cancer context or within the context of predicting survival for different subtypes.A total of 14 studies have taken into account multiple cancer subtypes whereas the majority of the studies have only covered only a single type of cancer subtype such as colorectal cancer (Willems et al., 2023), lymphoma (Li et al., 2023), colon adenocarcinoma (Lv et al., 2020), gastric cancer (Li et al., 2020), and so on.
Figures 4, 5 illustrate predictors distribution across survival endpoints.A majority of studies 67 (76%) have OS as an endpoint of survival prediction (Chai et al., 2021a;Abdelhamid et al., 2022;Benkirane et al., 2023;Bhat and Hashmy, 2023), whereas eight studies have incorporated multiple survival endpoints in their analysis.Out of eight studies, three studies have incorporated DFS and BC (Lee and Wang, 2003;Baek and Lee, 2020;Pellegrini, 2023).Two studies have incorporated OS, DFS, and PFS (Tan et al., 2020;Tang et al., 2021) and two studies have OS, and PFS as the survival endpoints (Jiang et al., 2022;Chauhan et al., 2023), one focuses on OS and DFS (Pant et al., 2023).A single study has focused on DFS only (Manganaro et al., 2023), and two only on BC (Li et al., 2021;Vahabi et al., 2021).The rest of studies either did not explicitly specify their endpoints for survival prediction or predominantly concentrated on predicting patients' survival outcomes without a specific focus on distinct survival endpoints.

. RQ IV: survival prediction data availability in public and private sources and opportunities for development of predictors
Survival prediction models development relies on the quality and quantity of annotated data, which is generated through extensive wet lab experiments.Experimental findings are stored in different types of databases that open new doors for the development of survival prediction applications.However, there exist multiple databases and each database encompasses particular diseases and modality specific survival data.For instance, CGGA (Zhao Z. et al., 2021) focuses on brain tumors, and MESA (Bild et al., 2002) contains data related to atherosclerosis.To accelerate the development of more competent survival predictors, it is essential to summarize which database contains which type of disease and what data modalities.In the highlight of research question IV, Table 4 illustrates public databases details in terms of diseases and data modalities they offer.
A deeper analysis of existing survival predictors reveals that among the 90 studies 58 utilized publicly accessible data from three key databases: the Cancer Genome Atlas Program (TCGA) (Tomczak et al., 2015), NCI Genomic Data Commons (GDC) (Jensen et al., 2017), and the Gene Expression Omnibus (GEO) (Clough and Barrett, 2016;Chai et al., 2021a;Hu Q. et al., 2021;Poirion et al., 2021;Zhao L. et al., 2021;Han et al., 2022;Jiang et al., 2022;Redekar et al., 2022;Wu and Fang, 2022;Wu et al., 2022;Zhang R. et al., 2022).Apart from public databases, there also exist private databases that have been utilized in existing survival prediction studies (Vahabi et al., 2021;Feng et al., 2022;Miao et al., 2022;Richard et al., 2022;Chauhan et al., 2023;Lee et al., 2023;Moreno-Sanchez, 2023).However, these private databases often restrict data access and may require extensive research proposals for data retrieval.Among these databases commonly used databases are Heidelberg University Hospital (Jung et al., 2023), COMBO-01 (Zhou et al., 2023), Life cohort (Unterhuber et al., 2021), and UNOS (Kantidakis et al., 2020;Raju and Sathyalakshmi, 2023).The reliance on private databases for survival prediction creates significant hurdles for research in several ways (Raufaste-Cazavieille et al., 2022).Firstly, limited accessibility to such data impedes the reproducibility and verification of study findings by other researchers, hindering the validation and robustness of predictive models (Misra et al., 2019).Secondly, the lack of transparency and standardized access procedures for private datasets introduces challenges in benchmarking and comparing different survival prediction models (Raufaste-Cazavieille et al., 2022).Lastly, the exclusivity of private databases may contribute to a potential bias in research outcomes, as the diversity and representativeness of the data are often compromised which impacts the generalizability of survival predictions to broader patient cohorts (Boffa et al., 2021).
Public access to databases enables researchers to create survival benchmark datasets that fosters the development of survival prediction models (Liu et al., 2018;Rahimi et al., 2023).However, many researchers develop datasets without making them  public which hinders transparency and the broader scientific community progress (Weston et al., 2019).The lack of shared data and presence of multiple datasets associated with a single disease pose a notable challenge in survival prediction.For instance, it hinders the establishment of standardized testing and benchmarking procedures for newly proposed survival prediction methods, leading to ambiguities in identifying the most advanced techniques (Wissel et al., 2022).Moreover, recognizing the need for standardization in benchmarking survival prediction models, Wissel et al. (2022)  contributing to a more uniform and transparent benchmarking framework within the survival prediction landscape.Particularly, here we emphasize the use of these datasets for benchmarking in addition to newly created datasets to have unified benchmarking for cancer-specific survival prediction models.
. RQ V, VI: survival prediction data modalities and utilization of their combinations for disease and survival endpoints specific predictors development Following the objective of research question V, the primary focus of this section is to investigate and provide a comprehensive summary of the various data modalities utilized in the development of diverse survival predictors.To address research question V, it describes the distribution of data modalities across predictors associated with four distinct survival endpoints, and 44 different diseases.Furthermore, in response to research question VI, it furnishes information regarding the specific clinical features utilized by various survival prediction studies.
The variability in omics-type selection is not solely bound to diseases but notably varies across a wide spectrum of survival endpoints.Figure 7 shows the counts of different omics types that have been utilized for different survival endpoints prediction.In the context of OS prediction, mRNA, miRNA, methylation, and CNV have been primarily utilized in more than 31 studies, with 10 studies based on proteomics, mutation, and metabolic data.However, in terms of DFS and PFS the selection of omics types appears less distinct.These endpoints have been frequently studied in conjunction with OS, predominantly utilizing mRNA, miRNA, and methylation data.This combination suggests a commonality in the predictive factors across these survival endpoints, indicating potential interconnections or shared biological processes.
Clinical data modality has been utilized in 42 different studies.However, in this modality number of features varied from study to study and it is still unclear which particular set of features is .

Data modality References
Frontiers in Artificial Intelligence frontiersin.org

FIGURE
Distribution of omics data modalities across a diverse set of diseases.Bar heights represent the counts of each data modality with respect to disease specific published research papers.For instance, CNV has been used in six papers related to breast cancer, mRNA has been used in seven breast cancer papers and so on.

FIGURE
Distribution of di erent omics modalities with respect to survival endpoints.
Frontiers in Artificial Intelligence frontiersin.orgmost important.To perform an in-depth analysis, which study utilized which subset of features across diverse cancer subtypes and heart diseases, a comprehensive collection of clinical features is presented in Table 6.In order to better understand and discern the trends in clinical features across diverse diseases, hereby they are placed in seven different categories i.e., demographic features (6), disease-specific clinical markers (71), treatment-related features (17), laboratory and biomarkers (48), comorbidity and lifestyle factors (18), and other factors (15).
A closer look at the clinical features across diverse diseases reveals a consistent set of fundamental demographic features i.e., age and gender which are prevalent in nearly all studies (Hathaway Q. A. et al., 2021;Unterhuber et al., 2021;Feng et al., 2022;Redekar et al., 2022;Li et al., 2023;Wang X. et al., 2023).Beyond demographic features, disease-specific features also play critical role for disease-specific survival prediction.For instance, cancerrelated studies invariably focus on tumor stage, histological type, and treatment specifics, underlining the critical role of diseasespecific clinical markers in prognosis (Lee et al., 2023;Pellegrini, 2023).
Treatment-related features such as chemotherapy, radiotherapy, and immunotherapy, are particularly evident in cancer subtypes specific studies which reflect the profound influence of therapeutic interventions on survival outcomes (Othman et al., 2023;Wang X. et al., 2023).Moreover, the recurrent inclusion of lifestyle and comorbidity factors ranging from smoking history and BMI to hypertension and diabetes across multiple diseases underlines their pervasive impact on prognostic modeling (Hathaway Q. A. et al., 2021;Bhat and Hashmy, 2023).These lifestyle and comorbidity features show the complex relationship between individual health choices and their potential influence on survival outcomes.
. RQ VII: feature engineering trends across data modalities and disease-specific survival predictors This section addresses research question VII by investigating the application of feature engineering methods in survival prediction studies across a variety of diseases.This will help researchers to analyze and understand trends of feature engineering techniques in disease or endpoint specific survival prediction pipelines.Additionally, it delves into the trends in diverse feature engineering methods and their relevance to clinical and multiomics data modalities.This investigation aims to reveal trends and patterns in the dynamic interplay between feature engineering methods and the specific characteristics of different data modalities, and survival endpoints.
A comprehensive analysis of feature engineering methods across a range of disease-specific survival prediction studies unveils that supervised methods, such as Cox regression, L1 regularized Cox regression, and RSF algorithm, have been prevalent in diseases like ASCVD, trauma, and ovarian cancer (Abdelhamid et al., 2022;Zhang S. et al., 2022).On the other hand, network based methods including NBS and WGCNA, have been applied in diseases like KIRP, and hepatocellular carcinoma, which shows the significance of network structures in certain medical contexts (Wang X. et al., 2023).Univariate analyses, including ANOVA, chi-squared, and univariate Cox regression, have been prevalent in diseases such as pancreatic cancer and heart failure, underscoring the significance of statistical testing in identifying relevant features (Moreno-Sanchez, 2023;Zhou et al., 2023).Furthermore, dimensionality reduction methods such as PCA, and NMF have been consistently used across various diseases namely, ovarian cancer (Zhang S. et al., 2022), lower grade glioma (Wu et al., 2022), colon adenocarcinoma (Lv et al., 2020), bladder and breast cancers (Tang et al., 2021;Lin et al., 2022).In addition, the potential of AEs, and VAEs have also been explored in diseases like glioblastoma multiforme, breast cancer, pan-cancer, and Lung Adenocarcinoma for feature integration and dimensionality reduction (Benkirane et al., 2023;Bhat and Hashmy, 2023;Hao et al., 2023).
While feature engineering methods exhibit specificity tailored to distinct diseases, their efficacy is influenced by the inherent characteristics of the utilized data (Jiang et al., 2017).This raises the pertinent question of which particular feature engineering method proves most effective in the context of clinical and multiomics datasets.A thorough analysis of feature engineering methods and their applicability with respect to clinical and multiomics datasets reveals that methods like Cox regression, CCA, AIC, and ANOVA have been quite widely utilized in studies involving only clinical data (Zeng et al., 2021;Moreno-Sanchez, 2023;Qian et al., 2023;Wang J. et al., 2023).These methods have been applied to clinical data for multiple reasons for instance, such methods are interpretable which is important to gain meaningful insights  (Chai et al., 2021a) variance (Zhao L. et al., 2021), DEOD (Lee et al., 2023), have been utilized to handle multiomics to capture important interactions among the features and to integrate cross modalities properly.Particularly, here methods such as AEs and VAEs play a significant role and recent studies also show a growing interest in using such methods for dimensionality reduction and feature integration by such methods for multiomics and clinical datasets i.e., AEs (Baek and Lee, 2020;Jiang et al., 2020Jiang et al., , 2022;;Li et al., 2020;Lv et al., 2020;Wang et al., 2020;Yang H. et al., 2020;Owens et al., 2021;Wu and Fang, 2022), and VAEs (Tong L. et al., 2020;Hira et al., 2021;Zhang X. et al., 2021;Benkirane et al., 2023).
Although the selection of a feature engineering method is tied to the characteristics of the disease and the nature of the data (Dong and Liu, 2018), there is no significant evidence to suggest that it is substantially impacted by survival endpoints such as DFS, PFS, BC, and OS.This assumption arises due to the absence of a consistent pattern in feature engineering method selection across different survival endpoints.Studies, such as Lv et al. (2020), Tang et al. (2021), and Manganaro et al. (2023), demonstrate a varied use of feature engineering techniques irrespective of the specific survival endpoints (DFS, PFS, BC, or OS).This lack of uniformity implies that feature engineering method selection is driven more by the unique characteristics of the data and disease than by the nature of the survival endpoint itself.
On the basis of various trends and patterns it can be concluded that for heart diseases, univariate analyses and supervised feature engineering methods have been utilized.Conversely, in terms of cancer subtypes a mixture of dimensionality reduction methods is observed with a recent trend toward the AEs.In terms of survival datasets, the prime focus has been to use supervised methods for clinical data and multiple dimensionality reduction methods for multiomics data.Moreover, there are no conclusive remarks that feature engineering methods get affected by the survival endpoints, as the current literature also suggests a varied use of feature engineering methods regardless of the survival endpoints.

. RQ VIII: survival prediction methods insights and distribution across disease types and survival endpoints
In pursuit of addressing research question VIII, this section presents an overview and insights about statistical, ML, and DL algorithms that have been utilized in existing survival prediction pipelines.It succinctly examines their emerging trends across diseases and survival endpoints.This exploration aims to empower researchers in identifying gaps within disease-specific and survival endpoint-focused studies, ultimately contributing to the enhancement of survival predictive pipelines.
It can be seen in Figure 8, diverse types of methods that have been utilized in survival predictive pipelines can be categorized into three different categories i.e., statistical, ML, and DL.Statistical methods are broadly classified into three different categories i.e., parametric, semi-parametric, and non-parametric models.Parametric methods make assumptions about the survival time distribution (Lee and Wang, 2003;Kubi et al., 2022).Parametric methods include exponential, Weibull, log-normal, Weibull, gamma models, and so on (Ishak et al., 2013;Kubi et al., 2022).Comparatively, semi-parametric methods make no assumptions about the shape of the baseline hazard function (non-parametric) (Kleinbaum and Klein, 1996).Rather, these methods assume a specific functional form for the effect of covariates (parametric) (Sinha and Dey, 1997).In comparison, non-parametric methods do not take into account assumptions about the underlying distribution of survival times and the shape of the hazard function.These methods include Kalpan-Meier, Nelson-Aalen, Breslow, Gehan-Eilcoxon, and life table methods (Stevenson and EpiCentre, 2009).Some statistical methods (i.e., COX-PH) have certain disadvantages with multiomics based survival prediction (Lee and Lim, 2019).For instance, COX-PH assumes linear relationships among variables and fails to capture complex and non-linear data patterns (Therneau et al., 2000).These methods perform poorly on high dimensional data where the number of features is larger than the number of samples.This specific gap is filled by the emergence of AI based models.Various ML models are utilized   for survival analysis such as random survival forest (Ishwaran et al., 2008), and boosting-based methods (Binder and Schumacher, 2008).Shivaswamy et al. (2007), Van Belle et al. (2007), and Khan and Zubek (2008) proposed ranking and regression-based survival SVM for survival prediction while handling right censored data.Particularly, survival SVM is used in three ways for survival prediction i.e., ranking, regression, and combined.Ishwaran et al. (2008) proposed RSF where log-rank test is utilized for the splitting as compared to the Gini impurity of the classical random forest models.DL methods are utilized in two ways to model survival prediction tasks i.e., continuous and discrete time (Kvamme et al., 2019).Models like CoxCC and time (Kvamme et al., 2019), piecewise constant hazard or PEANN (Fornili et al., 2014), andDeepSruv (Katzman et al., 2018) are utilized for continuous survival time prediction.Whereas, Nnet-survival (Gensheimer and Narasimhan, 2019), Nnet-survival probability mass function (PMF) (Kvamme and Borgan, 2019b), DeepHit and DeepHit Single (Lee et al., 2018), multi-task logistic regression (MTLR) (Yu et al., 2011;Fotso, 2018), and BCESurv (Kvamme and Borgan, 2019a) are utilized to predict survival in a discrete-time setting.

. RQ IX: open source tools and libraries potential for development of survival prediction pipelines
Following the objective research question IX, this section summarizes details of open-source libraries and source codes of existing survival predictors.This comprehensive information will facilitate researchers to build upon existing work, fostering a collaborative environment and accelerating the development of robust and effective survival prediction models.
Notably, addressing the lack of interpretability or explainability in the previously discussed libraries, Spytek et al. (2023) introduced Survex.This library allows researchers to analyze the features responsible for a specific event by offering different methods for both local and global explanations of various survival prediction models.
The selection of a specific library is inherently subjective and depends on factors such as the preferred development platform, choice of survival prediction models, and the specific research question in hand.Therefore, recommendations are made based on the number of survival prediction models and evaluation measures each library offers.For Python, Lifelines (Davidson-Pilon, 2019) and Pycox (Kvamme et al., 2019) are recommended, with Lifelines (Davidson-Pilon, 2019) providing a diverse range of statistical and ML models, while Pycox (Kvamme et al., 2019) is specialized in DL models.Additionally, for R, mlr3proba (Sonabend et al., 2021) is recommended, as it offers a variety of statistical and ML models for survival prediction.Ultimately, selecting a library aligned with individual research needs not only streamlines the development process but also contributes to the overall reliability of survival prediction models.
. RQ X: strategies for assessing survival predictors: unveiling common evaluation measures The main objective of this section is to provide a concise overview of research question X, which focuses on  the commonly employed evaluation measures for survival predictive pipelines.
Table 11 shows a compilation of 18 distinct evaluation measures that have been commonly used to evaluate survival prediction pipelines.The survival prediction pipelines can be categorized into two distinct classes namely survival outcome prediction (Lynch et al., 2017) and survival prediction (Tarkhan et al., 2021).Details related to these categories is provided in the background section.Out of 18 evaluation measures mentioned in Table 11, a set of 10 evaluation measures have been employed to assess the performance of survival outcome prediction models.In addition to the aforementioned measures, 8 other evaluation measures have been utilized to assess the performance of survival prediction models.
In survival prediction category based evaluation measures, the objective is to capture two distinct characteristics namely, calibration and discrimination (D'Agostino and Nam, 2003;Simino, 2009).Specifically, calibration refers to how well the predicted probabilities of survival align with the actual observed survival rates over time (D'Agostino and Nam, 2003).Under this paradigm most widely used evaluation measures are BS (Schumacher et al., 2003), IBS (Gerds and Schumacher, 2006), TD-ROC (Heagerty et al., 2000), andDCA (Vickers andElkin, 2006).Discrimination paradigm based evaluation measures capture differentiation between individuals with different survival times.Under this paradigm most widely used measures are C-index (Hartman et al., 2023), AUC-ROC (Terrematte et al., 2022), and likelihood ratio (Murphy, 1995).
. /frai. .On the other hand objective of survival outcome prediction evaluation measures is to assess diverse characteristics of a model i.e., efficacy of the model, overall accurate predictions, biasness toward type I or type II errors (Hao et al., 2023;Lee et al., 2023).Specifically, accuracy and F1 score are used to measure overall accurate predictions, precision, and recall examine the model's biasness with respect to type I and type II errors (Zeng et al., 2021;Wang et al., 2022).Additionally, MCC provides a comprehensive assessment, taking into account overall accurate predictions, and errors (Othman et al., 2023).In addition, AUC-ROC assesses the predictive potential of a model by analyzing the true positive rate (TPR) and true negative rate (TNR) at different thresholds (Hao et al., 2023;Pellegrini, 2023;Qian et al., 2023).

. RQ XI: publisher and journal-wise distribution of research papers
This section addresses research question XI by presenting the distribution of survival prediction literature across diverse journals and publishers.Overall, this analysis not only enables researchers to strategically position their work but also offers opportunities for interdisciplinary collaboration, promoting a more interconnected and dynamic research landscape within the domain of survival prediction.
In Figures 9, 10, the distribution of survival prediction literature is presented based on journals and publishers.The studies have been published in 25 different publishers, including but not limited to Springer, Elsevier, Oxford Press, and BioMed Central.Notably, around 30 out of 90 survival prediction studies have been disseminated through Springer, and BioMed Central.Furthermore, Elsevier has contributed to the field by publishing 10 relevant papers in recent years.Particularly, these studies have been published in more than 50 different conferences/journals, which shows the diversity of the survival prediction landscape.

Discussion
The field of disease survival prediction has become a pivotal aspect of effective healthcare, especially within the domain of precision medicine.Recognizing the significant variability present among patients within specific diseases, there is an increasing demand and development for disease specific survival predictors.Our analysis reveals that researchers place a profound emphasis on predicting survival in cancer as compared to other diseases, .

TABLE
A summary of evaluation measures used in survival prediction and survival outcome prediction pipelines.
DCA  et al., 2023;Hao et al., 2023;Othman et al., 2023;Wang J. et al., 2023 Useful when the cost of false positives is high, as it focuses on the accuracy of positive predictions.
Recall 6 Chauhan et al., 2023;Hao et al., 2023;Othman et al., 2023;Wang J. et al., 2023 Emphasizes the ability of the model to capture all positive instances, important for sensitive scenarios.and there are compelling reasons behind this focus.First, cancer exhibits significant variability from one patient to another as compared to other diseases, which highlights the imperative need for cancer survival prediction to explore and comprehend the heterogeneity of cancer.Second, cancer is a leading cause of death worldwide, and effective survival prediction can aid in early detection and intervention, potentially saving lives.Third, a huge amount of data sources are developed to make cancer-related data publicly available to accelerate and optimize cancer-related research.Furthermore, to analyze the trajectory of the disease, researchers place great focus on studying different survival endpoints that suit the respective research setting i.e., treatment, progression, recurrence, and death.Among four different survival endpoints i.e., OS, DFS, BC, and PFS, OS is often emphasized more in survival prediction studies.Despite the prime focus on OS, the significance of other survival endpoints in understanding disease trajectories cannot be understated.These survival endpoints help to analyze different characteristics of diseases such as understanding treatment efficacy and durability, treatments that not only extend life but also effectively manage the course of the illness, and markers responsible for disease recurrence.The lack of research in other survival endpoints opens up new research avenues for the AI experts to develop novel methods that can help explore various characteristics related to disease.
Although both public and private databases have been utilized in survival prediction studies, yet the preference for public databases stems from their accessibility and the wealth of information they provide.For instance, TCGA (Tomczak et al., 2015) offers a vast array of genomic and clinical data across different cancer types.This invaluable resource aids researchers in developing accurate survival prediction models.Likewise, GDC (Jensen et al., 2017) and GEO (Clough and Barrett, 2016) offer comprehensive datasets that encompass a wide range of diseases, making them appealing choices for various research endeavors.Furthermore, a crucial observation regarding private data sources is that they are not universally accessible.This argument is supported by the limited accessibility of omics datasets related to cardiovascular diseases.Despite a singular study employing omics data for survival prediction in cardiovascular diseases, the challenge lies in the difficulty of retrieving the original data.Authors often refrain from sharing their datasets, and obtaining access to databases requires extensive proposals, adding a layer of complexity to the development of novel survival prediction pipelines for cardiovascular diseases.This obstacle may impede the advancement of innovative survival prediction pipelines for cardiovascular disease.
Overall, the use of omics and clinical data in survival prediction tools marks a significant stride toward precision medicine.The distribution of omics types in survival prediction studies reveals a preference for mRNA, methylation, microRNA, and CNV across various cancer subtypes.In addition, the limited number of multiomics based survival prediction studies in cardiovascular diseases hinders definitive conclusions on the importance of specific omics types.Disease-specific patterns highlight the importance of tailored clinical markers, prominently seen in cancer studies with a focus on tumor stage and histological type.Treatment-related features, notably chemotherapy and radiotherapy, underscore the impact of therapeutic interventions on survival predictions.Moreover, clinical features along with omics data with diverse molecular aspects are utilized together to improve the performance of survival prediction models.Diverse survival prediction research accentuates the pivotal role of leveraging patient information, such as medical history, demographics, disease-related features, and diagnostic records.This trend reflects an increasing recognition of the potential of clinical data in not only understanding disease progression but also in guiding personalized treatment strategies and enhancing patient care.A recent benchmark study on survival prediction models with multiomics and clinical data also shows the significant role of clinical data in survival prediction across multiple cancer subtypes (Herrmann et al., 2021).In addition, our analysis reveals that increasing the total number of data modalities does not necessarily offer improved survival predictions, yet data modalities are quite specific to the disease and survival endpoints.Therefore, the selection of data modalities should be made very carefully as rather than improving the overall performance it can induce undesirable noise in the analysis.
One of the common problems in survival analysis is data censoring (Leung et al., 1997).Censoring arises when there is incomplete information about the time points and/or events of some subjects in a study.There are different types of censoring i.e., (I) Right Censoring is the most common type of data censoring, where an event does not occur for some subjects by the end of study or by the last time point at which data is collected.For example, a subject withdraws from the study or there is a lost follow up for a specific subject (II) Left Censoring is the least common type of censoring where the event may occur before the start of the study or during the data collection phase.(III) Interval Censoring arises when the event of interest occurs in a time interval but the exact time point is not known.In survival analysis, three assumptions are taken into account to infer censored data i.e., (I) Independent Censoring: assumes that the censoring times for multiple subjects are independent of each other.(II) Random censoring assumes that the time t at which individuals are censored must be random and the failure rate for subjects who are censored is assumed to be equal to the failure rate for subjects who remained in the risk set who are not censored.(III) Non-informative censoring occurs if the distribution of survival times (T) provides no information about the distribution of censorship times (C), and vice versa.Although, data censoring is quite important in terms of survival prediction, yet it has been discussed and dealt with properly in the existing studies.We recommend to incorporate comprehensive details of data censoring in future survival prediction studies.Particularly details on how each type of data censoring is handled should not be neglected.
Our analysis of the utilization of feature engineering methods raises two crucial points.First, even though a plethora of methods have been already tested for various survival prediction studies, autoencoder based methods tend to reduce the dimensionality of omics data modalities more efficiently.In addition, the rest of the methods work much better with clinical features.The success of feature engineering approaches is contingent upon the chosen technique with the inherent properties of the data.This highlights the importance of large-scale benchmark studies in guiding the selection of feature engineering strategies for the development of accurate predictive pipelines.
In end-to-end survival predictive pipelines, researchers have utilized methods from three different families namely statistical (Hazard models, Kaplan-Meier Estimator, Log-Rank Test, and Frailty Models) (Kleinbaum and Klein, 1996), ML (Random Forests, Support Vector Machines, Gradient Boosting Machines, and Nearest Neighbors) (Ishwaran et al., 2008;Ma et al., 2022), and DL (CoxNnet, DeepSurv) (Ching et al., 2018;Katzman et al., 2018).Statistical methods are unable to extract complex non-linear A plethora of statistical, machine ML, and DL based approaches have been utilized for survival prediction.However, our analysis reveals that statistical methods have significant disadvantages: they depend on strong assumptions like the proportionality of hazards, struggle with high-dimensional data, and complex non-linear relationships.Additionally, they are sensitive to outliers and missing data.In contrast, DL methods such as DeepSurv, Survival Convolutional Neural Networks (SurvCNN), BCESurv, SurNnet, and autoencoders are preferred due to their ability to handle high-dimensional data, model non-linear relationships, and integrate heterogeneous data sources, making them more robust and accurate for multiomics based survival prediction.
RQ IX: Open source tools and libraries potential for development of survival prediction pipelines For survival analysis in Python, Lifelines and Pycox are commonly used packages.Lifelines provides a diverse range of statistical and ML models, while Pycox specializes in DL models.Alternatively, mlr3proba is also used which can be a good choice for R users, offering a variety of statistical and ML models for survival prediction.
RQ X: Strategies for assessing survival predictors: unveiling common evaluation measures In our findings on survival prediction evaluation, we discovered two key assessment paradigms: calibration and discrimination.Calibration evaluates the alignment between predicted and observed survival rates over time using measures like BS, IBS, TD-ROC, and DCA.Meanwhile, discrimination measures such as C-index, AUC-ROC, and likelihood ratio focus on distinguishing individuals with varying survival times.Additionally, for overall model evaluation, we identified accuracy, F1 score, precision, recall, MCC, and AUC-ROC as crucial metrics for assessing efficacy, prediction accuracy, and bias toward type I or type II errors.
RQ XI: Publisher and journal-wise distribution of research papers Our analysis of the survival prediction literature reveals a diverse distribution across various publishers.Among the 16 publishers identified, notable contributions come from Springer, Elsevier, Oxford Press, and BioMed Central.Around 30 out of 74 studies have been disseminated through Springer and BioMed Central, highlighting their significant presence in the field.Additionally, Elsevier has contributed 10 relevant papers in recent years.These studies have been published across more than 50 different conferences and journals, underscoring the breadth and diversity of the survival prediction landscape.
patterns that is why in current predictors focus of researchers is on ML or DL based methods (Katzman et al., 2018).In spite of the applications and usefulness of traditional ML methods, they face numerous limitations when applied to survival prediction.These limitations arise either from the inherent challenges of survival data or from the models themselves.Such limitations include censored observations (Khan and Zubek, 2008), overfitting and outliers (Biccler et al., 2020;Nariya et al., 2023), and complex relationships among variables.ML models also suffer from outliers in survival prediction datasets (Biccler et al., 2020).DL methods address many of these limitations through their advanced architectures and ability to learn complex patterns from large datasets.DL models such as DeepSurv, extend the Cox proportional hazards model by learning non-linear representations of the covariates and handling censored data effectively (Katzman et al., 2018).This model leverages the strengths of neural networks to capture complex relationships and interactions between variables, improving prediction accuracy (Katzman et al., 2018).In addition, there are some advantages of ML methods as well, i.e., they perform better even on small datasets while DL methods require large data (LeCun et al., 2015).Similarly, ML methods decisions are explainable and DL methods decisions are black box (Dwivedi et al., 2023).Although, a research comunity is focusing on unveiling black box decisions of predictors.However, in survival prediction, most of predictors do not have explainability component (Krzyziński et al., 2023).But researchers are trying to incorporate explainability methods with survival models (Krzyziński et al., 2023).While developing different data modalities based on survival predictor, predictive pipelines require dimensionality reduction methods that avoid the curse of the dimensionality problem (Feldner-Busztin et al., 2023).Although several traditional methods (PCA, LDA, TSNE, UMAP etc.) have been developed to transform data into new space that have more comprehensive patterns and less number of features.However, these methods lacks in extracting and incorporating non-linear patterns of features (Gastinel, 2012;Kirpich et al., 2018;Degenhardt et al., 2019).On the other hand in deep learning based predictive pipelines, researchers are utilizing auto-encoders that are capable of generating more comprehensive feature space by extracting both linear and non-linear patterns of features (Tan et al., 2020).Following overall pros and cons of ML and DL based predictive pipelines, new predictors can be developed by utilizing ML based methods with smaller datasets.Moreover, in these predictors rather than utilizing traditional dimensionality reduction methods, autoencoders can be utilized.Moreover, when data is large, it is better to develop DL predictors but these predictors must be enriched with explainability methods.
With an aim to evaluate the performance of predictive pipelines, diverse types of evaluation measures have been developed.Each evaluation measure addresses a specific aspect of survival prediction models, precluding the possibility of any single metric being universally ideal for a comprehensive evaluation of survival prediction.For instance, C-index estimates the robustness and discriminatory power of the survival prediction model.In addition, BS and IBS measure the accuracy of a model on time distribution.Moreover, log-rank p-value evaluates the potential of the model by testing the differences in different survival groups.Although these measures are the most commonly utilized, there are diverse other evaluation measures for similar purposes i.e., restricted mean survival time (RMST), odds ratio (Pellegrini, 2023), Kappa for inter-rater reliability (Zheng et al., 2021), integrated absolute error (IAE), integrated square error (ISE), mean absolute error (MAE), integrated AUC (IAUC) timedependent integrated discrimination improvement, and timedependent net reclassification improvement (NRI).Furthermore, while these individual measures provide valuable insights, it is noteworthy to mention that their collective application offers a more comprehensive evaluation.Therefore, we recommend utilizing multiple evaluation measures to assess discrimination and calibration of survival prediction models.

Reccomendations
With an aim to expedite and enhance research in survival prediction.Hereby, on the basis of Table 12, we summarize some important recommendations for future survival prediction studies.
We highly recommend leveraging open-source tools and libraries for developing survival prediction pipelines.Pycox (Kvamme et al., 2019), Lifelines (Davidson-Pilon, 2019), and scikitsurvival (Pölsterl, 2020) are excellent choices, offering a rich array of pre-implemented statistical, ML, and DL models.In addition, selecting appropriate evaluation measures is paramount.We advise researchers to carefully choose measures aligned with their research question and survival prediction task.Utilizing multiple measures ensures a comprehensive assessment of model performance i.e., C-index, IBS, BS, and ROC (Schumacher et al., 2003;Gerds and Schumacher, 2006;Terrematte et al., 2022;Hartman et al., 2023).
Integration of clinical and omics data is key to improving prediction accuracy.Researchers should explore diverse data sources and consider disease-specific patterns and survival endpoints to enhance the predictive power of their models.Researchers should carefully use feature engineering methods tailored to their data characteristics.Autoencoder-based dimensionality reduction for omics data and traditional methods for clinical features can significantly enhance predictive pipelines.Particularly it is important to note that addressing data censoring transparently is essential for model reliability.We recommend providing comprehensive details on censoring types and handling methods to ensure the robustness of survival prediction models.
Both traditional ML and DL methods offer unique advantages.Researchers should explore the strengths of each approach, with a particular focus on DL methods like DeepSurv for capturing complex relationships.In addition, models like Transformers can also be used to deal with clinical data which shall be an interesting research perspective in future (Pant et al., 2023).Enriching survival prediction models with explainability methods is crucial for improving interpretability.By understanding and unveiling model decisions, researchers can enhance trust and adoption in clinical settings.By following such recommendations, researchers can contribute to the development of robust and effective survival prediction models, ultimately facilitating personalized treatment strategies and improving patient care across various disease.

FIGURE
FIGUREPRISMA flow diagram: a step-by-step process for articles search and their inclusion or exclusion criteria to generate a set of studies for further in-depth trends analysis.The included papers are collected from Jan to Jul .

FIGURE
FIGURECancer subtypes coverage based on pan-cancer or individual subtype settings.

FIGURE
FIGURESurvival endpoint distribution across diverse studies.

FIGUREFrontiers
FIGUREHierarchal illustration of survival prediction methods under three di erent categories.

FIGURE
FIGUREJournal-wise distribution of articles.

FIGURE
FIGUREPublisher-wise distribution of articles.
TABLE The scope and limitations of current survey papers.
Different concepts like image segmentation and feature extraction are discussed in detail with less emphasis on their utilization in ML or DL-based survival prediction.In addition, very few studies are referred related to pancreatic cancer survival prediction.
TABLE Distribution of survival predictors across individual diseases.
ukbiobank.ac.uk/UK Biobank is a large-scale prospective cohort study that aims to improve the prevention, diagnosis, and treatment of a wide range of illnesses, including cancer, by providing a rich resource for researchers to study the complex interplay between genetic, environmental, and lifestyle factors.
TABLE Diverse collection of clinical features utilized in various survival prediction studies.
TABLE Distribution of survival predictors across diverse diseases.
TABLE Summary of open-source survival prediction methods in existing studies.
TABLESurvival analysis libraries, models, and evaluation metrics.
of the past three years shows that survival prediction models have been developed majorly for 36 distinct cancer subtypes, as outlined in Table3.Additionally, the primary focus of this research has been on overall survival (OS) as the clinical/survival endpoint.The selection of data modalities depends on the disease and survival endpoint.For example, in cancer subtypes, mRNA, miRNA, and methylation data are most frequently used.Similarly, these modalities are commonly employed for OS endpoint, while distinct patterns are less evident for DFS, PFS, and BC endpoints.In addition, clinical features such as demographic, histological, lifestyle, and comorbidity are of high importance and are used commonly in survival analysis.RQ VII: Feature engineering trends across data modalities and disease-specific survival predictorsIn terms of cardiovascular diseases, univariate analyses, and supervised feature engineering methods i.e., Cox regression, L1 regularized Cox regression, and RSF algorithm, are common, while cancer research based on multiomics data opts for dimensionality reduction methods like PCA, AEs, and VAEs.There is no consistent or apparent effect of survival endpoints on the selection of feature engineering method in the published literature.RQ VIII: Survival Prediction Methods Insights and Distribution Across Disease Types and Survival Endpoints